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of about .5 showed Imperfect separation of wheat from non-wheat. This 
result points to the need to explore the achie/able crop separability 
in the spectral/teraporal domain and suggests evaluating derived features, 
rather than data channels, as a means to achieve purer spectral strata. 






PREFACE 

This report describes part of a comprehensive ajid continuing pro- 
gram of research concerned with advancing the state-of-the-art in remote 
sensing of the environment from aircraft and satellites. The research 
is being carried out for NASA’s Lyndon B. Johnson Space Center (JSC), 

Houston, Texas, by the Environmental Research Institute of Michigan 
(ERIM) . The basic objective of this multidisciplinary program is to 
develop remote sensing as a practical tool to provide the planner and 
decision-maker with extensive information quickly and economically. 

Timely information obtained by remote sensing can be Important to 
such people as the farmer, the city planner, the conservationist, and 
others concerned with problems such as crop yield and disease, urban 
land studies and development, water pollution, and forest management. 

The scope of our program Includes: 

1. Extending the understanding of basic processes. 

2. Discovering new applications, developing advanced remote 
sensing systems, and improving automatic data processing 
to extract information in a useful form. 

3. Assisting in data collection, processing, analysis, and 
ground truth verification. 

The research described in this Technical Memorandum was performed 
under NASA Contract NAS9-15476 during the period from December 15, 1978, 
through June 15, 1979. I. Dale Browne/SF3 was the NASA Contract Techni- 
cal Monitor. The program was directed by Richard R. Legault, Vice 
President of ERIM and Head of the Infrared and Optics Division, Quentin A. 
Holmes, Program Manager, and Robert Horvath, Head of the Analysis Department. 



The work has benefited from technical discussions with Richard J. 
Kauth, who derived the original reduction of variance factor that is 
used as one of the performance measures. I was inspired to explore 
the tolerance block approach to clustering by the lively interest of 
Richard C. Cicone, who, in addition, contributed creative ideas and 
editorial assistance. W. Frank Pont contributed to my understanding 
of stratification in a finite sampling environment. His memorandum on 
that subject is included as Appendix B. I gratefully acknowledge the 
help of these co-workers. 

It is obvious, but easily overlooked, that this study owes its 
existence to the supply of good quality Landsat digitized data from 

f 

Goddard Space Flight Center and John, son Space Center. Also essential 
was the pixel -by-pixel ground truth ^supplied for many LACIE segments 
which has made it possible to draw conclusions about the relative per- 
formance of clustering methods. 
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INTRODUCTION 


This memorandum describes a study whose purpose is to find improved 
methods of spectral stratification in the context of Procedure M, a system 
for estimating the acreage of an agricultiliral crop, such as wheat, from 
digitized Landsat data [1] . The development of this procedure was stimu- 
lated and supported by the Large Area Crop Inventory Experiment (LACIE) • 

Procedure M as applied to wheat recognition 

1. screens and transforms Landsat pixel data from a LACIE segment; 

2. clusters the pixels into field-like groups called ”quasi-f ields” 
that are homogeneous spectrally and spatially; 

3. clusters the quasi-fields spectrally into strata; 

4. labels sample quasi-fields from the strata "wheat” or "non- 
wheat"; and 

5. from these labels, forms a stratified sample estimate of the 
percent wheat in the segment. 

Step 3, the clustering of quasi-fields into strata, is designed to 
separate wheat from non-wheat strata and thereby achieve a sampling ef- 
ficiency. By this, we mean that a smaller stratified sample will give 
the same accuracy as an unstratified sample. Another way of putting it 
is that when the two samples are the same size, the stratified estimate 
is more accurate. 

The grouping of pixels into quasi-fields has been largely successful. 
Figure 1 is a histogram of the percent wheat in quasi-field interiors. 

(The interiors consist of pixels faced on all four sides by pixels from 
the same quasi-field.) This histogram was compiled over all quasi- 
fields that have interiors from 12 Kansas segments, three acquisitions 
each. Most of the quasi-fields have less than 10% or more than 90% wheat. 
Between 10% and 90% wheat, there is only a small scattering of quasi- 
fields t 




Te rim 

The picture would not be as pretty if we included edge pixels (i.e., 
those that are not interior) in the quasi-fields but we would not expect 
it to be. Edge pixels are often crossed by field boundaries and are the 
ones that suffer most from misregistration. 

The corresponding histogram for strata (Figure 2) shows some mixing 
of wheat and non-wheat quasi-fields. To make this histogram comparable 
to the other, the stratum count is weighted by the number of quasi-fields 
in each stratum. Also for comparability, the histogram is based on quasi- 
field interior ground truth. So whatever fuzziness is in this histogram 
is not caused by edge pixels. 

A big group of non-wheat quasi-fields are put together into rela- 
tively pure strata. The group is not as big as in the quasi-field histo- 
gram, for whe'\ we compare tha two figures, we see that some of the 0 to 
10 percent quasi-fields in the quasi-field histogram have spilled over 
in to the 10 to 20 and 20 to 30 percent bins in the stratum histogram. 
Similarly, the stack of wheat quasi-fields is spread out into the 8U to 
90 and the 70 to 80 bins. 

The stratification was carried out by our unsupervised clustering 
algorithm BCLUST [2j. The question we are considering is whether strati- 
fication can be improved by a better clustering algorithm. 

One problem with BCLUST is its tendency to produce a few large 
clusters and many small ones. Figure 3 shows a typical distribution of 
pixels in a 40-cluster stratification. We try to sample in proportion 
to the size of the strata because this is the best rule when the stratum 
wheat proportions are unknown. But in the BCLUST stratification, the big 
strata are multiply sampled and many small strata are unsampled. Leaving 
the small strata out would create a bias, so we combine the zero-allocation 
strata into one wastebasket stratum and sample from it proportion,^! to 
size. (But we require at least one quasi-field in the sample.) We can- 
not expect that this wastebasket stratum will be pure, so the sampling 
from it is inefficient. 


3 


„ 1200 - 

■s 

1000 - 





Figure 3. PIXEL DISTRIBUTION 
SEGMENT 1165 



The large strata do not have sampling problems if they truly separate 
wheat from non-wheat. But if they are so large that they mix up the 
wheat and non-wheat quasi-fields then ifc would be better to divide them 
further into smaller strata, more localized spectrally and more homo- 
geneous with respect to crop type. 

A good clustering algorithm that produced more uniformly- sized strata 
might improve on the stratification performance of BCLUST. In Section 2, 
we define two candidate algorithms. In Section 3, we define a performance 
measure for comparing the three algorithms and in Section 4, describe an 
experiment to carry out the comparison. 
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TWO CLUSTERING ALGORITHMS BASED ON TOLERANCE BLOCKS 


An approach to defining a clustering algorithm producing equal-sized 
clusters is the use of tolerance blocks, an idea suggested to us by 
R. P. Heydorn [3]. '^Tolerance blocks** are equally-populated regions 
of spectral space constructed as follows. We decide on a small number 
of channels, t^, t^, to generate the blocks. We consider the first 

channel t^ and order all the quasi-fields according to this channel. We 
separate this orde7:ed group of quasi-fields into n^ equal-sized subgroups — 
equal in the sense of having approximately the same number of pixels 
(Figure 4) . Then we consider each subgroup in turn, order it according 
to our next channel t^, ^nd divide it into n^ smaller subgroups (Figure 5). 
We can now consider each one of these smaller subgroups, order it according 
to our third channel t^, and divide it into n^ still smaller subgroups. 

We keep this up for all the generating channels specified. The final 
subgroups are the tolerance blocks, n^^, n^, all. 

Not all channels need be included in this process. If the same set 
of channels is used in a different order, the tolerance blocks produced 
are not necessarily the same. (The results, however, were very similar 
in our tests.) When channel t^ is used to divide the first set of sub- 
groups, the points of division will, in general, be different from sub- 
group to subgroup (column to column in Figure 5). Because we don't cut 
any quasi-fields in half, but rather assign them to one subgroup or 
another, the equality of the pixel size of the subgroups can only be 
approximate. 

Table 1 gives a handy reference list of combinations of n^, n^y .. ., 
and the number of blocks produced for each. A description of computer 
code for generating tolerance blocks is given in Appendix A. 
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(The cuts in Channel separate the quasi-fields 
into five regions of nearly equal pixel size.) 


Figure 4, FIRST GUT TO CREATE TOLERANCE BLOCKS 



(The columns are equal-sized groups of quasi-fields sepa- 
rated by cuts in Channel t]^. The rectangles are equal- 
sized groups of quasi-fields separated by cuts in Channel t2.) 


Figure 5. FIRST AND SECOND GUTS TO CREATE TOLERANCE BLOCKS 
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TABLE 1. TABLE OF COMBINATIONS OF CHANNEL DIVISIONS FOR TOLERANCE 
BLOCKS AND THE NUMBER OF BLOCKS PRODUCED 
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The blocks are spectrally homogeneous with respect to the generating 
channels. How homogeneous they are depends on the number of divisions in 
each channel. But because the number of blocks is the product of the 
number of divisions, the number of divisions in each channel must be 
small if we are to end up with a reasonably small number of blocks. So 
spectral homogeneity of tolerance blocks is limited in two ways: some 

channels are left out of the block construction and those that are repre- 
sented may have coarse divisions. 

In order to achieve a greater spectral homogeneity, we defined a 
second tolerance block algorithm that uses all the spectral channels in 
the clustering process. The tolerance block means are used as seeds 
distributed like a network throughout spectral space. Around the seeds, 
clusters are formed by ordinary spectral clustering using a distance 
function. Although a subset of channels may have been used to create 
the blocks, all channels are used to compute the block means and carry 
out the clustering. We hoped to combine in one algorithm the virtues 
of uniformly-sized clusters and spectral homogeneity. 

How well the tolerance block algorithms have succeeded in equalizing 
the clusters can be seen in Figure 6, a comparison of distributions of 
strata sizes produced by the three algorithms. BCLUST has a very uneven 
distribution as we have seen. Many clusters have only a very small num- 
ber of pixels. When the tolerance blocks themselves are used as clusters, 
the distribution is very even. When the tolerance blocks are used as 
seeds, the distribution is less even than for the blocks but considerably 
more even than for BCLUST. 
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MEASURE OF PERFORMANCE: THE FIXED SAMPLE 

REDUCTION OF VARIANCE FACTOR 


Although the tolerance block approach to spectral clustering equalizes 
the size of the strata, the question remains whether it accomplishes its 
main purpose: to produce strata that discriminate between wheat and non- 

wheat. To answer this question we developed the measure of stratification 
performance that is discussed in this section. 


3.1 REDUCTION OF VARIANCE FACTOR (RV) 

The measure of performance heretofore used [4] to evaluate cluster- 
ing parameters and methods is the reduction of variance factor 


RV = 


all strata i 


Pi> 


np(l - p) 


( 1 ) 


where n^ is the number of pixels in stratum i, 

p^ is the proportion of wheat in stratum i, 

n is the number of pixels in the segment (n = In^) , 

p is the proportion of wheat in the segment (p = Sn^p^/n). 

The RV is the ratio of two variances: the variance of the stratified 

sample estimate divided by the variance of the unstratified sample esti- 
mate. It is a number between 0 and 1. A small number is good. It means 
that the stratified estimate has a considerably smaller variance than the 
unstratified estimate and so the stratification is doing some good. We 
can verify in expression (1) that if the strata are either pure wheat or 
pure other, then either p^ or 1 - p^ is 0 and the numerator is 0. If 
the stratification is worthless, then the p^'s are all the same as p and 
the factor becomes 1. 


// 




3.2 RV WITH INTEGER ALLOCATIONS 


The RV as a performance measure is unrealistic in two ways. For 
one thing, it assumes that we are allocating the sample in proportion 
to the size of the strata. Such an allocation is optimal in the ab- 
sence of information about the true percent wheat p^ in each stratum. 

But it is an approximation because the number of quasi-fields sampled 
from a stratum must be an integer whereas with few exceptions, the 
proportional allocation is a real number. 

The approximation becomes absurd when the number of strata increases 
beyond the size of the sample. Then strata must be sampled with a 
probability rather than with certainty and the variance should rise. 

But the simple expression (1) does not take account of this effect and 
continues to decrease (get better) as the number of strata increases. 

The approximation is not burdensome when we compare results for 
clustering algorithms producing approximately equal numbers of strata. 

But when the numbers are unequal, as when we are trying to find the 
optimal number of clusters for a given algoritlm, the comparison is 
invalid , 

So we can define a better performance measure by assuming a realis- 
tic sample size, say 100 quasi-fields, and allocating them to strata 
as best we can, that is, as nearly as possible proportional to size. 

If some strata are left unallocated, we’ll combine them into a waste- 
basket stratum and sample it. Then the RV becomes 



pU - p) 


a 

the number of pixels in stratum i, 
the proportion of wheat in stratum i, 

the number of sample quasi-fields allocated to stratum i, 
a are the corresponding numbers for the segment. 


where n^ is 
p . is 

^i 

and n, p, 



The allocations {a^} are made by a subroutine ALLOCB’^ as follows; 

1. Determine the theoretical allocation an,/n for each stratum i. 

i 

2. Round this number to the nearest integer. 

3 . Collect all the strata with allocation 0 into a wastebasket 
stratum and allocate sample quasi-fields to it proportional 
to size, but as least 1. Thus no strata are left out of the 
sampling . 

4. If the integer allocations don’t add to a, multiply the frac- 
tional allocations by 1 -I- e and repeat. e is chosen by an 
algorithm that makes the procedure rapidly converge. There 
are, however, some numerical combinations that prevent conver- 
gence, and then we settle for an allocation that doesn’t quite 
add up to a. 

The RV with integer allocation (2) is not likely to improve as the 
number of strata exceeds the sample size because the number of terms 
being summed in the numerator of (2) remains constant and the waste- 
basket stratum, in all probability heterogeneous, increases in size. 

3.3 THE FIXED-SAMPLE RV 

A second unrealistic assumption in using expression (1) is sampling 
with replacement. In fact, it is only reasonable to assume sampling 
without replacement, implying a hypergeometric , rather than a binomial 
model. The effect on the RV is to multiply top and bottom by correc- 
tion factors as follows: 


^ALLOCB is very similar to the allocation subroutine in Procedure M. 

*5^«We are indebted to T. Pendleton, Johnson Space Center, NASA, for this 
suggestion. 
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Fixed-Sample RV = 



2 P^(l - Pj) 


pq - p) f b - 

a \h -- 




(3) 


where is the number of pixels in stratum i, 

is the proportion of wheat in stratum i, 

is the number of sample quasi-fields allocated to stratum i, 
b^ is the number of quasi-fields in stratum i, 
and n, p, a, b are the corresponding numbers for the segment. 


This is the realistic performance measure that we will use for com- 
paring clustering methods. It is still an approximation because it as- 
sumes that all sample quasi-fields are the same size*. 


An implication of the finite collection factors is that stratifica- 
tion incurs a cost. Let us illustrate by an example. Suppose that we 
create 100 strata, so evenly divided that we allocate one sample quasi- 
field to each stratum. The correction factor in the numerator is always 
1 and drops out. In the denominator, b, the number of quasi-fields might 
typically be 400, so the correction factor is 3/4. Now suppose that 
the stratification completely fails to discriminate, so that p^ is con- 
stantly equal to p. Then everything cancels out but the 3/4 and we are 
left with a reduction of variance factor of 1 1/3! This means the 
variance of the stratified estimate is 1/3 more than that of the un- 
stratified estimate. Stratification hasn't helped in this case! 


This example is extreme because if the stratification were made at 
random, then just by chance we would expect some p^'s to be 0 or 1, and 
perhaps others to be closer to 0 or 1 than p. So tvo opposing forces 
influence stratification: the finite correction factors penalize strati- 

fication and discrimination of wheat from non-wheat rewards it. If the 


* In fact, they are not, and the unbiased scheme used in Procedure M for 
sampling from unequal-sized quasi-fields [1, pp 31-37] does not have 
a simply-expressed variance. 
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stratification is made at random, the two forces would be expected to 
approximately cancel each other out, as is shown in Appendix B. 

A theorem by W. Cochran [5] implies that the simple RV (1) never 
increases and will usually decrease when any of the strata are broken 
up into smaller strata. This theorem led us into the comfortable belief 
that stratification, even if irrelevant, could only help. Cochran's warn- 
ing that the theorem does not precisely apply to finite sampling is 
exemplified by our sampling problem, in which the gain or loss from 
stratification depends on how pure the strata are with respect to the 
crops of interest. They have to be pure enough to compensate for the 
finite correction factors or stratification hurts. 
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EXPERIMENTS ON 12 KANSAS SEGMENTS TO EVALUATE 
THE TOLERANCE BLOCK CLUSTERING ALGORITHMS 

To evaluate the tolerance block techniques of clustering, we con- 
ducted experiments >:>n 1976 LACIE Phase 2 data from 12 segments in Kansas. 
The application of die data was to measure the amount of winter wheat 
grown in these segments, so stratum purity was defined as the separation 
of wheat from non-wheat. The Tasselad Cap transformed channels Bright- 
ness and Greenness [2, pp 6-10] from three biowindows were used as 
follows: 


Bio window 1 
Biowindow 2 
Biowindow 3 


Bi .vightness 
Channel 1 
Channel 3 
Channel 5 


Greenness 
Channel 2 
Channel 4 
Channel 6 


The 12 segments were choser. from the blind sites so that ground 
truth could be used to measure t.he performance of the clusterings. Only 
segments with clear data for the three biowindows were used. They were: 
1021, 1035, 1165, 1851, 1852, 1861, 1865, 1886, 1163, 1167, 1860 and 
1887. 


The fixed-sample RV was used as a performance measure with a sample 
size of 100 aagumed. In the remainder of this report, we will multiply 
the reduction of variance factors by 1000 and refer to the RV (expres- 
sion 1) and the 100-sample RV (expression 3) as the case may be. The RV 
will always be, and the 100-saraple RV nearly always be, between 0 and 
1000, the smaller, the better. 

To review, the three algorithms being compared are: 

BCLUST, which accumulates clusters using a spectral distance 
function. 
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2. "Blocks alone", in which the tolerance blocks themselves are 
clusters. 

3, "Block seeds", in which the tolerance block means are used as 
seeds for accumulating clusters. 

Our motivation in this study was to try to improve the clustering 
so that the strata would achieve purity comparable to that of the blob 
interiors. Obviously, this goal could not be achieved if we calculated 
the wheat proportion p^ from all the pixels in stratum i whether they be 
edge or interior. The degradation of the RV factor as we move from quasi- 
field interiors to the whole quasi-fields would then necessarily be 
reflected in the RV factor for strata. 

The clustering process operates on the means of the quasi-field 
Interiors. What we want to know is whether we can so successfully clus- 
ter these interior means that the purity of the clusters (as measured by 
the RV or the 100-sample RV) approaches that of the quasi-field interiors 
themselves. 

For this purpose, the ground truth of the quasi-field interiors 
extrapolated to the whole fields is appropriate. Such an extrapolation 
has been used to provide a close approximation to the percent wheat in a 
segment. But as we have pointed out, it would result in strata apparently 
purer than a pixel count would verify. But we aren’t interested in 
purity measured by percent of pixels, but rather pi,irity in the sense 
that wheat fields are grouped together in strata and so are the non- 
wheat fields. The kind of purity we are interested in is best measured 
by the truth that best characterizes the quasi-fields. 

4.1 TESTS TO DETERMINE WHICH CHANNELS TO USE FOR TOLERANCE BLOCKING 

As discussed previously, we can carry out the tolerance blocking 
in many different ways (see Table 1) . We can use from 1 to 6 channels 
for the blocking. The fewer channels we use, the more divisions we can 


allov7 in each channel. The order in which the channels are blocked could 
make a difference, To find a good blocking configuration in a reasonable 
length of time, we carried out the search in three stages. 

4.1.1 TEST OF THE HELPFULNESS OF THE CHANNELS IN BIOWINDOW 1 

In the first stage, we conducted a test of the relevance of the data 
from Biowindow 1. The motivation for the test was that the number of 
possible combinations of channels is bewildering, and if we could deter- 
mine that two of the channels were not really helping, we could cut down 
this number considerably. 

The experiment consisted of running BCLUST so that exactly 40 clus- 
ters were produced, first using Channels 1...6 and then 3... 6. The re- 
sults are given in Table 2 in terms of the RV. Analogous results would 
have been obtained with the fixed-sample RV because the finite correction 
factors would have been similar in each case. 

In six of the segments, a substantial reduction in the RV is obtained 
by including the first two channels. In the other six segments, the dif- 
ference is trivial. The average difference is 48 points. A t test for 
differences shows that the significance level of the improvement in the 
12 segments is 0.025. There seems to be no relation between the Julian 
date of pass 1 and the improvement in RV. 

We conclude that we cannot dispense with Biowindow 1 in our study 
of tolerance block clustering. 

4.1.2 SEARCH FOR THE BEST PAIR OF CHANNELS FOR TOLERANCE BLOCKING 

In the second stage we tested pairs of channels and single channels. 
The purpose was to find the best pair of channels and include it in a 
favored position in all the channel combinations tested in the second 
stage. A second purpose was to compare results from two orderings of 
the same combination of channels. 


TABLE 

2. 

RV FACTORS OBTAINED BY RUNNING 

BCLUST 



WITH AND WITHOUT CHANNELS 1 ANE 

1 2 



(The smaller 

the RV the better. 

) 




RV Factor 



Julian Date 



Segment 


of Pass 1 

With 1&2 

Without 

1020 


92 

126 

225 

1035 


312 

538 

521 

1165 


326 

814 

810 

1851 


19 

349 

388 

1852 


295 

361 

436 

1861 


349 

317 

442 

1865 


349 

552 

541 

1886 


311 

453 

456 

1163 


70 

512 

653 

1167 


70 

516 

531 

1860 


294 

350 

323 

1887 


311 

462 

597 

Average 

Difference 

-48 


t Value 

for 

Difference 

2.59 


Significance of t 

0.025 
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The results of the single-channel test are summarized in terms of 
RV in Table 3 and of the pair test in Table 4. We would expect analogous 
results with the fixed-sample RV because of the constant sample size. 

The single-channel results identify Channel 4 (greenness in the 
second biowindow) as the most helpful discriminator of all the channels 
and indicate that the greenness channels are more helpful than the bright- 
ness channels. 

The pair results in Table 4 present us with a dilemma: which results 

are more relevant, those obtained from blocks alone or from block seeds? 

If we were going to limit the number of channels used in the blocking to 
two, then the block-seeds results would be most applicable because as we 
shall see in Section 4.3, the block-seeds RV is consistently lower than 
the blocks-alone RV. 

However, to find the pair of channels that will best combine with other 
channels to form multi-channel blocks, the blocks-alone results seem most 
helpful. The seeding operation carries us one computational step away 
from the effect of separating the data space according to the channel pair. 
One feels that when the seeding step is applied, differences that showed 
up in the blocking stage are to some extent averaged out. This conclusion 
is reinforced by the relative uniformity of the block seeds results in 
Table 4 and by the invariance of the blocks-alone results over reversed 
pairs. Therefore, in our search for the best combining pair, we give 
greater weight to the blocks-alone results. This is why the single- 
channel test, which was run subsequent to the pair test, has the blocks- 
alone results only. 

Our conclusion is that 3 and 4 (Biowindow 2) are the best vcombining 
pair of channels and that, at least in an eight-by-eight blocking, it 
makes no difference which channel is blocked first. We’ll keep our eye 
on Channel 2 because it showed up well in the single-channel test and 
was in the only significant pair in the block-seeds pair test. 
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TABLE 3. COMPARISON OF 64-DIVISION SINGLE-CHANNEL TOLERANCE BLOCKS 


Biowindow 1 


Biowindow 2 


Biowindow 3 



Channel 

RV 

Bright 

1 

37 

Green 

2 

-34 

Bright 

3 

68 

Green 

4 

-75* 

Bright 

5 

12 

Green 

6 

-8 


(The tabulated number is the average over 12 segments of the dif*^ 
ference between the single-channel RV for a segment and the average 
RV over all single channels for that segment. A negative number is 
a good score. These results are given for blocks as clusters only.) 


*Dif ference significant by t test at 0.05 level 


TABLE 4. COMPARISON OF 2-CHANNEL TOLERANCE BLOCKINGS CONSTRUCTED 
FROM EIGHT DIVISIONS IN EACH CHANNEL 


(RV) (RV) 

Pair Blocks as Clusters Clusters Seeded by Block Means 

3 4 -56* 2 

4 3 -54* 1 


5 

6 

14 

-7 

6 

5 

10 

-10 

6 

4 

4 

3 

4 

6 

8 

7 

5 

4 

-15 

-8 

3 

6 

6 

-2 

1 

4 

2 

14 

2 

4 

2 

-15* 

1 

2 

25 

10 

2 

1 

36 

-2 

1 

6 

24 

0 

2 

6 

-4 

6 


(The tabulated number is the average over 12 segments of the dif- 
ference between the pair RV in a segment and the average RV over 
all pairs in that segment. A negative number is a good score.) 


*Dif ference significant by t test at 0.05 level 
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That Channel 3 really does help Channel 4 is shown by the fact that 
the average RV for Channel 4 alone is 594 and for the pair (3,4) is 489, 
more than 100 points lower. An interpretation of this difference is that 
10% fewer quasi-fields are needed in the sample when Channels 3 and 4 
generate the clusters than when Channel 4 alone dess. 

4.1.3 SEARCH FOR THE BEST CHANNEL SET INCLUDING THE BEST PAIR 

The third stage of the channel search was to test various combina- 
tions of channels, each combination incorporating the best pair (3,4). 

Although the two-channel test did not help us decide the order of the 
channels, other results showed that the last channel in the blocking 
process is better ordered by the blocking than the earlier channels. 

So in the tested combinations we put Channel 4 last, Channel 3 next to 
last and, aside from these two, favored Channel 2. 

In all combinations we kept the number of blocks fixed at 96. The 
patterns of channel divisions were as follows: 

Pattern of Channel Divisions 
8 12 
4 4 6 

2 3 4 4 

2 2 2 3 4 

2 2 2 2 2 3 

The combinations tested were all possible combinations of the i 

other channels with 3 and 4 . The only combinations permuted were 
(6, 2, 3, 4) and (5, 6, 2, 3, 4). (6, 2, 3, 4) seemed like a good 
bet because it contains the green channels along with 3 and 4, and 
(5, 6, 2, 3, 4) seemed also a good five-channel combination to try 
because it left out Channel 1, which had been indicated to be least 
effective. 





Number of Channels 
2 

3 

4 

5 

6 


2erjm 

The results of the test are summarized in Table 5. The most re- 
markable feature of the results is their uniformity. The largest dif- 
ference from average is 20 points, a modest difference compared with 
the 75 points that distinguished Chaimel 4, the 56 points that dis- 
tinguished the pair (3,4), the 100-point improvement of (3,4) over 4 
alone, and the three-figure differences occasioned by leaving out Chan- 
nels 1 and 2. 

When the (3,4) blocking was used as seeds, the RV came out a small, 
but statistically significant 19 points worse than average. With 8 and 
12 divisions in the two channels, it is possible that the seeds were not 
scattered widely enough in six-dimensional space. Instead of taking 12 
divisions in a channel and chopping it up so fine, we might as well take 
one more channel and divide the three channels into four, four and six 
parts. 

Of the three-channel blockings, (2, 3, 4) seems to be slightly 
preferable. This (2, 3, 4), the four-channel, five-channel and six- 
channel combinations are all indistinguishable in performance. We will 
use a four-channel combination (6, 2, 3, 4) which has the three green 
channels and the good pair (3,4), 

4,2 OPTIMAL NUMBER OF CLUSTERS FOR THE ALGORITHMS 

In order to compare the two tolerance block clustering methods 
with BCLUST clustering, we need to know at what number of clusters, 
on the average, each algorithm performs best. Then we will have a 
valid comparison between the algorithms at their best parameter settings. 

For each algorithm, we computed the most realistic performance 
measure, the 100-sample RV, for a variety of numbers of clusters be- 
tween 16 and 96. A computer program interpolated this number for all 
integers included in the range and averaged the interpolated value for 
the 12 segments. 
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TABLE 5. COMPARISON OF 96-BLOCK CLUSTERINGS CONSTRUCTED 
FROM CHANNEL COMBINATIONS CONTAINING 3 AND 4 


Combination 
3 4 

5 3 4 

6 3 4 
2 3 4 
13 4 

5 6 3 4 
12 3 4 

6 2 3 4 
2 6 3 4 
5 2 3 4 

15 3 4 

16 3 4 
5 6 2 3 4 
2 5 6 3 4 
1 5 6 3 4 
1 6 2 3 4 

1 5 6 2 3 4 


(RV) 

Blocks as Clusters 
5 

11 

3 

-4 

20 

-9 

4 
-3 
-5 

-10 

5 
3 

-3 

-2 

-8 

-5 

2 


(RV) 

Clusters Seeded by Block 


19* 

3 

11 

-5 

2 

3 

-6 

3 

5 

-6 

-9 

0 


-11 

-5 

-7 

0 

2 


(The tabulated number Is the average over 12 segments of the dif- 
ference between the combination RV In a segment and the average 
RV over all combinations In that segment. A negative number Is 
a good score.) 


Means 


*Dlfference significant by t test at 0.05 level 


The number of clusters produced by BCLUST is varied by adjusting 
a parameter t, the greatest distance a quasi-field can be from a cluster 
mean and still belong to the cluster. Figure 7 shows the graph of 
BCLUST performance as a function of the number of clusters. It is a 
smooth curve, because of interpolation and averaging, with a minimum 
(best score) at about 40 clusters. 

The number of clusters produced by the tolerance block algorithms 
is varied by changing the number of division in the channels that 
generate the blocks. Table 6 shows the divisions producing 11 cluster 
numbers between 16 and 96. 

The performance of the tolerance block clustering algorithms as 
a function of the number of clusters is shown numerically in Table 6 
and graphically in Figures 8 and 9. The block-seeds algorithm has a 
minimum of 40 clusters. The blocks-alone algorithm has a minimum at 
48, While the minimum is a razor-thin choice of 48 over 96, the next 
best numbers are all in the 32 to 54 range, lending support for the 
validity of a minimum at 48 . 

In this section we have seen three examples of an optimal number 
of strata considerably smaller than the sample size — examples of how 
the benefits from increased stratification were not sufficient to 
cover the cost of stratification. 

4.3 COMPARISON OF THE TOLERANCE BLOCK ALGORITHMS WITH BCLUST 

We can now compare the performance of the three clustering algo- 
rithms. The performance measure is the 100-sample RV and is measured 
at the optimal number of clusters for each algorithm: 40 for BCLUST 

and block-seeds, and 48 for blocks-alone. 

The result for each of the 12 Kansas segments and the average 
results for the 12 is given in Table 7 . The blocks-alone algorithm 
averages 70 points worse than BCLUST and 74 points worse than block- 
seeds — differences that are significant by a t test. Also, the 
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(The performance measure is the 100-sample RV 
averaged over 12 Kansas segments.) 

Figure 7. BCLUST PERFORMANCE AS A FUNCTION OF THE 
NUMBER OF CLUSTERS 
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(The performance measure is the 100-sample RV 
average over 12 segments.) 

Figure 8. PERFORJoANCE OF BLOCK SEEDS AS A FUNCTION 
OF THE NUMBER OF CLUSTERS 
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(The performance measure is the 100-sample RV 
averaged over 12 segments.) 

Figure 9. PERFORMANCE OF BLOCKS AS CLUSTERS AS A FUNCTION 
OF THE NUMBER OF CLUSTERS 






TABLE 6. CHANNEL DIVISIONS AND PERFORMANCE OF THE TOLERANCE 
BLOCK CLUSTERING ALGORITHMS AS A FUNCTION OF THE 
NUMBER OF CLUSTERS 


Channels Used For Blocking 
6 2 3 4 


Number of 
Clusters 

Number of 

Divisions in Each 
Channel 

100-Sample RV 

Block“Seeds Blocks-Alone 

16 

2 

2 

2 

2 

553 

633 

24 

2 

2 

2 

3 

528 

621 

32 

2 

2 

2 

4 

539 

601 

36 

2 

2 

3 

3 

537 

602 

40 

2 

2 

2 

5 

514 

641 

48 

2 

2 

3 

4 

541 

588 

54 

2 

3 

3 

3 

541 

599 

60 

2 

2 

3 

5 

534 

653 

72 

2 

3 

3 

4 

538 

635 

81 

3 

3 

3 

3 

532 

655 

96 

2 

3 

4 

4 

538 

589 


(The performance measure is the lOO-sample RV averaged over 12 Kansas 
segments.) 


TABLE 7. 


COMPARISON OF THREE CLUSTERING ALGOR TTHMQ 


Quasi-Field 

Interior 


Segment Ry 

1020 39 

1035 187 

1165 204 

1851 155 

1852 136 

1861 90 

1865 86 

1886 168 

1163 283 

1167 178 

1860 145 

1887 168 

Average 153 


Average Difference 
t 

Significance 


BCLUST 
40 Clusters 


181 

217 

624 

560 

832 

922 

383 

404 

423 

454 

355 

389 

610 

580 

502 

452 

622 

621 

652 

614 

385 

361 

643 

588 

518 

514 


4 


.31 

Not Significant 


Blocks-Alone 
48 Clusters 

239 
578 
872 
495 

614 
396 

615 
532 
725 
813 
420 
758 

588 
-74 
-3.47 

Significant 
at .005 


Block— Seeds 
40 Clusters 
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preference is consistent: 11 out of 12 segments for each comparison. 

Although the clusters produced by blocks-alone have the sampling advan^ 
tage of uniform size, they are probably less homogeneous spectrally 
than the clusters from the other algorithms. Two of the channels were 
not considered at all by blocks-alone, so the clusters would not be very 
homogeneous in those channels. The channels that were used had 2, 2, 

3 and 4 divisions in them, so homogeneity was Imperfect. The clusters 
of the other two algorithms, by contrast, were formed by a spectral dis^- 
tance function and thus emphasized spectral homogeneity. 

Between the best tolerance block algorithm (block-seeds) and BCLUST, 
there is no significant difference. In addition, the preference for one 
algorithm or the other is equally divided among the 12 segments. Thus, 
the evidence of this experiment is that tolerance block clustering does 
not improve spectral stratification. 

The ’’quasi-field interior RV” column, measuring the purity of the 
interiors of the quaoi-fields that make up the strata, is included as 
a standard of comparison. These low scores show that most of the quasi- 
field interiors have zero or 100 percent wheat or very close to it. A 
perfect clustering technique would put the zero percent quasi-fields in 
some clusters, the 100 percenters in others, and achieve similar RV 
scores. Yet Table 7 shows a 361-point average difference between the 
scores. The interior RV was calculated by an expression analogous to 
(1), so it is not strictly comparable, but even if we raise all the 
scores in the interior RV column by 1/3 to approximate the effect of 
the finite sampling correction factors, a tremendous gap remains. 
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5 

CONCLUSIONS AND RECOMMENDATIONS 


5.1 CONCLUSIONS 

Two tolerance block techniques and a clustering technique for 
spectral stratification were evaluated with respect to the estimation 
of winter wheat acreage in 12 LAGIE segments in Kansas. The techniques 
are (1) to accept tolerance blocks as clusters, (2) to use all-channel 
means of tolerance blocks as fixed seeds for spectral clustering, and 
(3) to conduct unsupervised spectral clustering (BCLLST) , 

Of the two tolerance block techniques, the seeded clustering tested 
significantly better as measured by the 100-sample reduction of variance 
factor (a performance measure on the scale of 0 to 1000 that is similar 
to a previously-defined reduction of variance factor but which, more 
realistically, takes account of sampling efficiency) . Blocks as clus- 
ters produced more evenly-sized clusters, which enables efficient 
sampling, but this advantage was more than balanced by the greater 
spectral homogeneity of the seeded clusters. 

When the tolerance-block-seeded clustering was compared with the 
unsupervised clustering method BCLUST, there was no significant dif- 
ference. So in our experiment, the better of the two tolerance block 
stratification techniques did not show any Improvement over previous 
methods. 

A gap of better than 300 points remains between the 100-sample RV 
scores achieved by ouar two best stratification methods (about 515) and 
what is theoretically attainable, the score c* 153 for quasi-field 
interiors. 

The optimal number of strata for a sample of size 100 was not 
found to be 100 or anything close to it, but rather, 40 for BCLUST 
and the block-seeded algor itlmi and 48 for the blocks-themselves algorithm. 
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The reason the optimum numbers weren’t higher is because correction fac- 
tors applied to finite sampling variances imply a cost to stratification 
that must be made up by purity of strata. In our experiment, 96 fine 
strata were not enough purer than 40 'coarser strata to defray the cost 
of the additional strata. 

In pursuit of these main conclusions, some subsidiary conclusions 
were reached. 

1. Tolerance block clusters were more uniformly sized than BCLUST 
clusters, enabling them to be sampled more efficiently. How- 
ever, this advantage did not result in better stratification 
performance. 

2. Channels in the first biowindow do help the clustering as 
applied to winter wheat estimation. The reduction of variance 
score for BCLUST averaged 48 points better when these channels 
were included. 

3. The best channel subsets for generating tolerance blocks con- 
tain brightness and greenness from the second biowindow. 

5 . 2 RECOMMENDATIONS 

The tolerance block study could be carried a little further by 
investigating the use of tolerance block means as seeds arid allowing 
the updating of means and/or cluster creation and/or iteration of 
clustering. But the payoff from this effort is likely to be small 
when we compare the distant goal of relatively pure clusters with 
the modest scores of the clustering methods tested. 

A more promising approach would be to redefine features and test 
the clustering of these new features using the criterion of the 100- 
sample reduction of variance factor. The Tasseled Cap features we 
used in the experiment have the virtue of univeral applicability. 
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Their use implies only that different materials and crops are localized 
in separate neighborhoods in spectral space. The relative poorness of 
the stratification performance indicates the need of features better 
tailored to the decision problem being considered. Such features could 
be so specialized that they depend on the crops to be recognized, the 
confusion crops, the climate, and the prevalent varieties and agricul- 
tural practices. There is still room to hope that less special, green- 
profile-type features [1, pp 20-30] might have a general application 
to agricultural decision problems. 

If better features are found, there could be a greater reward for 
dividing the feature space into finer strata. Then the sampling advan- 
tage gained by the size uniformity of tolerance block clustering could 
have a greater effect on the performance comparison with BCLUST. So 
it is too soon to dismiss tolerance block clustering methods from 
consideration. 

The search for features is made in the hope of closing the gap 
between the RV of .5 found for the strata and the RV of .15 measuring 
the purity of the quasi-fields. The possible existence of confusion 
crops inherently inseparable from wheat could define a higher bound 
than .15 for achievable separability. It may be possible to measure 
this bound directly, possibly on the basis of a count of identical 
pairs of data vectors arising from wheat and non-wheat fields, and to 
chart its value as a function of the acquisitions available. Such a 
study would give useful feedback in the search for features and provide 
a warning when multispectral estimation alone is insufficient. 

We should not overlook the possibility that other clustering 
methods might perform significantly better than the ones we tested. 
CLASSY is now running after much theoretical and practical development. 
How would it do on the same 12 Kansas blind sites? This would give us 
another data point for assessii . the potential of clustering with our 
present features and also provide an opportunity to improve the cluster- 
ing component of Procedure M. 
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As ways are found to improve spectral clustering, the remaining 
decision of identifying the clusters becomes less subjective and error 
prone. In the extreme, we need make only one identification per cluster 
and this could be done from a smoothed mean value with appropriate atten- 
tion to historical and economic data. So the finding of better features 
and clustering methods leads directly to the goal of objective, accurate 
crop acreage estimation. 



APPENDIX A 

COMPUTER CODE FOR CONSTRUCTING TOLERANCE BLOCKS 

For -ionvenience, we will use the word ”blob" in place of the word 
"quasi-field^' in this appendix. 

The general outline of the algorithm for constructing tolerance 
blocks is given in Figure A.l. Detailed XTRAN code for the construction 
of tolerance blocks and computation of mean data vectors for tolerance 
blocks is given in Figure A. 2. XTRAN is a language extending FORTRAN 
in several ways that will be obvious to the reader. 

The general outline speaks for itself and we will assume the reader 
has gone through it. The detailed code contains conventions particular 
to the clustering program containing it. The following are some notes 
explaining the code. 

We start by assuming that all the blobs being processed are indexed 
L = 1,...,QNSS. The channel data values in the blobs are contained in a 
data array FDATA(K,L) (floating point) or equivalently DATA(K,L) (integer), 
where K is the channel number and L is the blob index, a number between 
1 and QNSS. 

252: Bypass tolerance block construction if NTOL, the specified 

num.ber of tolerance blocks, is 0. 

255 and 257: SEGNO is the segment index number of the blob. 

A group of segments are given indices, say 1-40, for ease of array 
storage. If SEGNO = 0, the data point is not a true blob and should 
be disregarded. If IT(SEGNO) = 0, the blob is from a segment that the 
user has decided not to process, so the blob is disregarded. 

256: PIX(L) is the number of pixels in blob L. 

253-261: The NBLOB acceptable blobs are identified by a PO vector 

referring to the index of each acceptable blob. 


User specifies channels TOL(l) • , .TOL(NTOL) for constructing blocks 
and the number of classes NCLASS (1) . . ,NCLASS (NTOL) each channel 
divides the data into. 

Read blobs 1,...,QNSS. The blobs are thus indexed. Some may be 
unacceptable. 

Define the first data group as the NBLOB acceptable blobs. 

The group is identified by a position vector P0(1) . . .PO(NBLOB) 
giving the index of each acceptable blob. 

The algorithm consists of permuting P0(1) .. .PO (NBLOB) until it 

orders the blobs into tolerance blocks. Where the blocks begin 
and end will be sho\\m by a vector CL(1) . . .CL(NC) giving the 
number of blobs in each data group. At the end, the data 
groups are the tolerance blocks. At the start, there is just 
one data group of all NBLOB blobs. During the algorithm, the 
data groups are subdivided according to the data values of the 
channels used for construction. 

So to start with, CL(1) = NBLOB and NC = 1. 


Do the indicated scope for each tolerance channel TOL(M), M = 1,...,NT0L. 
Do the indicated scope for each data group I, I = 1,..,,NC. 


I 


Form a vector V of length CL (I) of channel TOL(M) values in 
the data group. 

Sort V and, at the same time, permute the part of PO corre- 
sponding to data group I. 

Cut up data group I into NCLASS (M) subgroups of nearly equal 
pixel size, building onto a subgroup vector CC of the 
numbers of pixels in the subgroups. 


Make the new NC equal to the total number of subj^roups. 
Move the CC vector to CL. 


End with NC: the number of tolerance blocks 

CL(1) . . .CL(NC) : the number of blobs in each block 

P0(1) . . .PO(NBLOB) : blob indices ordering the blobs into blocks 


Figure A.l. GENERAL OUTLINE OF THE ALGORITHM FOR CONSTRUCTING 

TOLERANCE BLOCKS 
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CUHPUTE CL(l)...CL(MC), the SIZEB OF THE TOLERANCE BLOCKS 
AND A POSITION VECTOR PO ORDERINH THE BLOBS BY TOLERANCE BLOCKS 
AT PRESENT^ THIS OPTItlN ASSUMES THAT ALL THE. PIXELS ARE ON ONE LINE 
IF NTOL = 0 GO TO ENOTOl 
NBLf^B = 0 
DO LSI# DNSS 

SFGNO = 0ATA(C19,L) 

PIX(L) s RASEaOATA(C14#L) + DATA(C15#L) 

IF SEGNO 0 & IT(SEGNO) 0 

NBLOB = NRLQH + I 
POCNRLOR) 3 L 
END IF 
END DO 
NC = 1 

CLCn = NBLOB 


M 3 0? DO while M < NTOL; M = H f 1 
J s 0 
NCC s 0 
TH s TOLCM) 

NCM = NCLASS(M) 


1 = 1 + 1 


original page is 


JTY OF Til 


I = 0» DO WHILE I < NC> 

CLI = CL(I) 

NPIX = 0 
DO Lsl, CLI 

V(L) = FOATA(TM, PO(J + U) 

NPIX s NPIX + PIX(POtJ + Ln 
END DO 

SORT V AMD AT THE SAME TIME PERMUTE 
CALL VSORTP(V. CLI, PO(J+t) ) 


POCJ+n...POCJ + CL(I)) 


SPLIT C(n UP INTO NCLASStM) SUBCLASSES OF "EOUAL" PIXEL SIZE 

LPIX = 0 

SUMPIX s 0 

OLOL = 0 

NLEFT = NCM 

QUO = NPiX/NCM 

L = Of DO WHILE L < CLIf L = L + 1 
OLDPIX s LPIX 
LPIX = LPIX > PIX(PO(J+L)) 

IF LPIX >= QUO 

IF LPIX - QUO > QUO - OLDPIX 
L = L - I 
LPIX s OLDPIX 
EMD IF 


Figure A. 2. LISTING OF XTRAN CODE FOR TOLERANCE BLOCK CONSTRUCTION 

(First of Three Pages) 
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295 


IF 1 > OLOL 

296 


NCC = NCC + 1 

297 


CC(NCC) = L - OLDL 

298 


END IF 

299 


OLOL = L 

iOO 


NLEFT = NLEFT - 1 

301 


SUMP IX = SUMP IX + LPIX 

302 


IF MLFFT > 0 OUd = (NPIX - SUMPIX)/NLEFT 

303 


LPIX = 0 

30/^ 


END IF 

305 


END while 

306 

* 


307 


J = J + CLI 

300 


END while 

309 

A 


310 


CALL mover (CC, cl, NCC) 

311 


MC = NCC 

312 


END while 

313 


NCELL = NC 

314 

A 


315 

A 

IF YOU WANT the tolerance BLOCKS THEMSELVES AS B CLUSTERS; 

316 


IF 0CTOL 

317 


L = 0 

310 


DO 1=1, NC 

319 


CLI = CL(I) 

320 


NOm = CL(I) 

321 


DO J=l, CLI 

322 


L = L + 1 

323 


PL = POCL) 

324 


DATA(BCHAN,PL) = I 

325 


WH = DATA(CS1,PL) 

326 


S = IT(DATA(C19,PD) 

327 


IF WH ■•= tot 

328 


NPCI,S) = NP(I,S) + PIX(PL) 

329 


NW(I,S) = NW(I,S) ♦ PIX(PL)*WH . 

330 


END IF 

331 


END DO 

332 


END DO 

333 


GO TO 7 

334 


END IF 

335 

★ 


336 

A 

TOLERANCE BLOCK DEBUGGING PRINTOUT 

337 


IF OEHUGT 

336 


WRITE (0, "'ODATA LIST’/") 

339 


DO L=l, ONSS 

340 


WRITE (B,t06) L, DATA(CtO,L), DATA(C15,L), 0ATACC21 

341 


t CFDATACJ,!.), J = t,NOAT) 

342 


t06 format (15, 17, 10, 16, F7.0, 18F5.0) 

343 


END DO 

344 


WRITE (8, -OSORTED DATA LIST'/") 

345 


END IF 

34 ^ 

■k 



Figure A. 2. (continued) 





^ERl 


347 

3^8 

3^9 

350 

351 

352 

353 
35a 

355 

356 

357 

358 

359 

360 

361 

362 

363 
36a 

365 

366 

367 
366 

369 

370 

371 
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376 

377 

378 

379 

380 

381 

382 

383 


* COMPUTE T0LE9AMCE BLOCK HKANS AS SFEOS FOP B CLUSIEPING 
L = 0 

on 1=1, NC 

CLI = CLCT) 

CALL ZERO (9f X(l)i X(NDAT) ) 

NPIX = 0 
WHPER = 0. 
on J=i, CLI 
L = L ♦ l 
PL = PO(L) 

NPIX * MPTX + PIXCPL) 

DO K = l,MDATt X(K) 3 X(K) ♦ FO A T A ( K 1 PL ) AP I X ( PL ) ; END 
IF DERUGT 

WHPE« = WHPER f PIX(PL)*DATA(C21#PL) 

write (8,107) PL, PIX(PL), 0ATA(C21rPL)f (FDAT A (K , PL ) , 
t K=1,N0AT) 

107 format (15/ 217/ 3X, 20F5.0) 

END IF 

ENO on 
FPIX = NPIX 
CON(I) X 0. 

00 KSI, NDAT 

X(K) X X(K)/FPIX 
YK 3 X(K)^WT(K) 

MEAN(K/I) X •2,*YK 
CON(I) a CON(I) + YK*YK 
ENO 00 
IF DEBUG! 

WHPER a WHPER/FPIX 

WRITE (8/108) 1/ NPIX/ WHPER/ (X(K)/ Kal/NDAT) 

108 format (*0*/ la, 17/ F7.0/ 3X/ 20F5.0) 

ENO IF 

WRITE (8/108) 

UPNOCI) = UPTOL 
END DO 

ENDTGL: 


reproducibility of the 

ORIGMAT, PAGE IS POOR 


Figure A. 2. (continued) 





265: Loop through the channels specified for constructing blocks, 

m(M), M = 1,...,NT0L. 

266,279,307: J is the index that specifies the part of the PO 

vector that corresponds to a data group. It starts at 0 and is incre- 
mented by CL(I), the number of blobs in data group I. 

271: Loop through the data groups I = 1,...,NC. 

275: V is built up of the data values of channel TOL(M) from 
blobs in data group I. 

279: VSORTP is a handy subroutine from the International Mathematical 

and Statistical Libraries that efficiently sorts a vector such as V and, 
at the same time, permutes another vector, here P0(J+L) . . .P0(J+CL(I)) , the 
part of the PO vector corresponding to data group I. 

281-305: Divide data group I into subgroups of nearly equal pixel 

size. The output is building onto a long vector, CC, of subgroup sizes, 
starting with the first data group, and updating NCC, the number of sub- 
groups so far. NCC was set equal to 0 at 267 and is incremented every 
time a subgroup is defined. NPIX was computed as the number of pixels 
in data group I (276) and LPIX is the number of pixels currently in the 
subgroup (289). The idea is to establish a pixel quota QUO, initially 
NPIX divided by the number NCLASS(M) of subgroups to be established, and 
keep including blobs in the subgroup until the quota is exceeded (290). 

At this point, we have to decide whether the current blob L, (or more 
accurately, the blob identified by P0(J+L)) belongs in the current sub- 
group or the next one. If the number of pixels by which the blob exceeds 
the quota (LPIX-QUO) is greater than the remaining pixels in the blob 
(QUO-OLDPIX) , then the blob belongs in the next subgroup. So the blob 
index L is set back 1 (292) and the number of pixels in the subgroup 
reverts to the number before the too-big blob was encountered (293) . 




We don’t necessarily update CC and NCC at this point. What if the 
too-big blob were the first one in the data group? We wouldn’t want to 
count an empty subgroup. So we check the index L of the last blob allowed 
in the subgroup against OLDL, the index of the last blob in the previously- 
defined subgroup to avoid this anomaly, if L > OLDL (295-298) then the 
subgroup is non-empty. We define it by incrementing the count NCC of 
subgroups and appending the number, L-OLDL, of blobs in the subgroup 
to the CC vector. 

The new quota QUO is formed by dividing the number of pixels left 
in the group NPIX-SUMPIX by the number NLEFT of subgroups to be defined. 

It may be that some groups with one or two large blobs in them cannot be 
fully divided into NCLASS(M) subgroups. 

310: MOVER simply moves CC(1) , . . . ,CC(NCC) into the space formerly 

occupied by the CL vector. It is the new CL. 

313: NCELL, used later in the program as the number of clusters, 

is set equal to the number of tolerance blocks defined. 

315-334: The switch BCTOL is set ’’true" when the tolerance blocks 

are to be the clusters. Then this section of code is enabled rather 
than the usual clustering mechanism which is located beyond the tolerance 
block calculations. This section has to do, therefore, all the chores 
the clustering mechanism has to perform: the cluster number is included 

in the data array as the user-specified channel BCHAN (324) and certain 
running totals are computed to make possible the calculation of the 
reduction of variance factors (325-330) . 

340: C14 and C15 are the data channels specifying the number of 
pixels in Blob L. C21 is the ground truth channel, whose value is an 
integer between 0 and 101 giving the percent wheat in Blob L. A value 
of 101 means the ground truth is unknown. 
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347- 381: The means of the data values in all channels are computed 

for the tolerance blocks, regardless of how many channels were used in 
the construction of the blocks. This section also contains debugging 
printout in the sections enabled by tlVe user-set switch DEBUGT. 

351: NDAT is the number of data channels containing multispectral 

data. X(l) . . . ,X(NDAT) , the block mean, is initialized to 0. 

348- 355: Index I here runs through all the blocks and J runs 

through all the blobs in each block while L is counting through the 
blobs as a linear index, 

358: The block mean X is a pixel mean computed by weighting the 

blob channel values by the number of pixels in the blob. 

360: T-7HPER is the percent wheat in the tolerance block. It is 

computed only for debugging printout. In this statement, WHPER is 
updated by 100 times the number of wheat pixels in the blob (i.e., the 
number of pixels in the blob times the percent wheat in the blob). In 
375, the cumulated WHPER is then divided by the total number of pixels 
in the block (357 and 366) to get the wheat percent in the block. 

361: In this debugging printout is the original index of the 

blob, the number of pixels in the blob, the wheat percent in the blob 
and the mean data vector for the blob (computed as always from the 
interior pixels) . 

371,372: The block mean is stored as a cluster seed for ].ater 

use in the clustering program. The cluster constant CON (I) and the 
multiplication of the mean by WT(K) and -2 are peculiarities of the 
clustering program. [Instead of computing (X - X^) for the data 

point X and the cluster mean X., the program multiplies it out and 

2 — — 2 ^ 2 
computes X - 2X. + X. for each cluster. Because X is the same for 

^ ^ 2 
each cluster, it is omitted and the i minimizing -2X. + X. is chosen. 

— 2 XX 

GON(I) is X^ . WT(I) is to allow for weighted clustering.] 






376; In this debugging printout is the block number, the number 
of pixels in the block, the wheat percent in the block and the mean data 
vector for the block. 
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APPENDIX B 

EFFECTS OF RANDOM STRATIFICATION 
(W. Frank Pont) 


B . I INTRODUCTION 

It is shown in Section 3 that the stratified random sampling vari- 
ance could be larger than the simple random sampling variance of the 
same size when sampling without replacement. It should be pointed out 
that there are two conflicting factors which affect the sample variance. 
These are: 

1. Grouping the population elements into strata whose proportions 
of grain are possibly closer to 0 and 1 than the proportion of 
grain in the population. This factor tends to lower the 
variance of the stratified proportion estimate compared to 
the variance of the unstratified proportion estimate. 

2, The number of samples which are obtainable in stratified sampling 
without replacement is smaller than the number of samples obtain- 
able in simple random sample of the same size. This factor tends 
to increase stratified sampling variance compared to the vari- 
ance of the unstratified proportion estimate. Sometimes, fac- 
tor 2 outweights factor 1. 

R. Kauth pointed out that while stratification based on spectral 
variables could be defined in such a way as to make the proportion of 
grain in each stratum nearly equal (hence defeat the purpose of stratify- 
ing) , it is very unlikely that stratification which assigns elements 
which look alike into the same stratum would have this effect. The worst 
that should happen in spectral stratification is that the spectral char- 
acteristics of the elements might have nothing to do with the true label, 
in which case the stratification would be random with respect to the true 
labels. Thus, his conjecture was: The probability structure and sampling 

variance of random stratification followed by stratified sampling is the 


X 


2Frjm 

same as simple random sampling of the same sample size. Using a res- 
tricted definition of random stratification, we show that the probabili- 
ties of obtaining a given sample are the same under both sampling models. 
However, the variance of simple sampling proportion estimate is smaller 
unless the stratified sampling is proportional to size. 


B.2 NOTATION AND CONVENTIONS 

We assume that a sample of size n is to be selected from a population 
of size N. We assume that the elements of the population, b=l,2,3, . , . ,N, 
are assigned at random in such a way that there are Q strata, denoted as 
s=l, 2,3 , . . . ,Q of size N^, N^, N^j-./N^. In the stratified sampling, n^ 
elements are to be selected from stratum s. We note the two relations: 

N = Ni -f N 2 + ... + Nq 

and 


n = n^ -|- n 2 + ... n^ 

A stratification is a function which associates every element b with some 
stratum s=l,2,...,Q with the above restrictions. Formally, the function 

i:{l, 2, 3, N}|-v {1, 2, 3, Q} 

is a stratification if, for s=l, 2,3 , . . . ,Q, the cardinaixty of the set 

{b:i(b) = s} = N 

s 

/ N 

There are 

fixed stratification and use I to denote a stratification chosen at ran- 
dom from all possible stratifications. We also view a sample as a function 
which tells us whether an element b is in the sample or not. The sampling 
function is 

j:{l, 2, 3, N} t-> {0,1} 

j(b)=l if b is in. the sample, 

=0 otherwise. 


possible stratifications. We use i to denote a 
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There are 



possible simple random samples and 


Q 

7T 

S=1 



Stratified random samples with respect to a fixed stratification i. We 
use j to denote a fixed sample (one which has already been chosen) and 
J to denote a sample which is to be randomly selected from all possible 
samples. 


A sample function j and a stratification function i are compatible, 
denoted iv j , if sample j could have been obtained for the stratification 
i. That is, the cardinality of the set {b:i(b) = s and j (b) = 1} is n 

s 

for every s=l,2,...,Q. 


We assume every element b has a label 1 or 0 (grain or non-grain in 
our case) denoted as L(b) . Since we are not only choosing the sample j 
at random but also the stratification, we need to define the probabilities 
associated with I, J and (I,J). 






-1 


is the probability that i is chosen as the stratification; 



is the probability that sample j is chosen; 



n 


V2» 



1 


N-n 
N2-H2 . 


Q Q 



if i'^'j . Otherwise, 


This is the conditional probability that stratification i is chosen given 
that sample j has been chosen. This result can be obtained by direct 
counting or by the use of Bayes theorem. 
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Q |N\-1 

° s=ir4 


if io/j • Otherwise, 




is the conditional probability that sample j is selected given that 
stratification i has been chosen. 

The joint probability can be defined as; 

Pjj(i.j) - FiCOPjii^id) O' 

In the first case, for 


( N \-l Q /n \ -1 


^^s ' Q n ! (N -n ) ! 
- S=^l s s s 

N! N ! 

s=l s 


7T n ! (N ~n ) ! 

- s s s 
s-1 

N! 


In the latter case, for i^j 

Pij(i,j) = j=j 


m n \-ll N-n I 

n^.n^, . . .n^l | 

/Q \ /Q \ 

I ^ %'\l ^ ^^s"V' 

_ n! (N-n) ! | s=l j| s=l 

N! \ n! I\ (N-n)! 


-1 


N! 
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which is consistent with the first definition. Note in both cases. 


Pjj(i,j) = 0 


if i-3^ j . 

Now that we have defined the ioint probability of I and J, we can 


view and P as marginal probabilities, that is 
J. u 


P^-Ci) - ^Pjj(i,j) 


and 


P.(j) = E PTT^i^j) 




A consequence of this is that the probability of obtaining a fixed sample 
j is the same in simple random sampling and in random stratification fol- 
lowed by stratified random sampling. 


We now examine the two estimates. The simple random sample propor- 
tion P depends only on j , namely 




,I.(b) 


P(3) ■ 

Given a stratification i, we can introduce i into this relation 


Q 


Q 


p(3) - y - y 


( 1 ) 


s=l “ b:i(b)=s s 
j(b)=l 


s=l 


where 




= t- b:ia)=s 

® j(b)=l 


The stratified random sample proportion P is defined by 

. Q N 


s-1 


( 2 ) 


We note that (1) and (2) are equal if 

n N 


_s _ __s 
N 


A 






n 





where s=l,2,3, . . . ,Q; l.e., if we are able to sample proportional to size 


B.3 SAMPLE VARIANCE UNDER THE TWO MODELS 

In this section, the letter "E"'v/ill stand for "expectation of" 
and "V" for "variance of". 

«!/«.■') - ■ P 

where 

1 

P = I E I^(b) 

b=l 

E3.|j^.P(I.J) = = P(J) (3) 

V(P(I,J)) = Ejj(P(I,J) - P)^ 

= E^j(P(I,J) - P(J) +P(J) - P)^ 

= E^j(P(l,j) - P(J))^ + 2E^j((P(I,J) - P(J))(P(J) - P)) 
+ E^j(P(J) - P)^ 

= E^j(P(I,J) - P(J))^ + 2E^j((P(I,J) - P(J))(P(J) - P)) 

+ V(P(J)) (4) 

E^j[(P(I,J) - P(J))(P(J) - P)] 

= ^2 [(P(i>j) -P(j))(P(j) - P)]P,,(i,j) 

(i,j)i''^j 

■yfy: (P(i,i) - P(j))P.,,_.(i)l(P(j) - P)P.(j) (5) 
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E 

i: i^j 


(i _l\ 


e( i 

s=ll Nn n‘lb:i(b)~s 
® ' j(b)=l 


1 I S I^(b) 




-E e(^--;) E Ub)P I (i) 

s=l i:i'''j 1 s lb:i(b)-s ^ ^ 

j(b)=l 

Q I N n \_ 

£, .^.1 iT ■ iT 

S=1 Itl'V'JI I ' 

Q I N n I _ 




Q I N n l_ 

= eJ TT - l-w' 


- 0. (6) 

We now have from (4), (5) and (6) 

V(P(I,J)) = E^j(P(I,J) - P(J))^ + V(P(J)) 

» V(P(J)) 

. _ n N 

V(P(I,J)) = V(P(J)), if 

We conclude that the variance is increased by stratification which 
is random with respect to the labels unless the strata sample sizes are 
proportional to strata size, i.e., 

n N 
s _ s 

N 


s/ 


n 
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